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(64) Video image compression using weighted wavelet hierarchical vector quantization 



(57) A weighted wavelet hierarchical vector quanti- 
zation (WWH VQ) procedure is initialed by obtaining (12) 
an N X N pixel inriage where 8 bits per pixel. A look-up 
operation ^14) is performed to obtain data representing 
a discrete wavelet transform (DWT) lollowed by a quan- 
tization of the data. Upon completion of the look-up, a 
data compression will have been performed. Further 
stages and look-up will result in further compression of 
the data, i.e., 4:1. B:i , 16:1 . 32:i, 64 i, ..etc. Accordingly, 
a determination (16) is made whether the compression 
is complete. If the compression is incomplete, further 
k>ok-up is performed. If the compression is complete, 
however, the compressed data is transmitted (18). Op- 
tionally, it is determined (19) al a gateway whether further 
compression is required. II so. transcoding is performed 
(20). The receiver receives (22) the compressed data. 
Subsequently, a second look-up operation (24) is per- 
formed to obtain data representing an inverse discrete 
wavelet transform of the decompressed data. After one 
iteralbn.the data is decompressed by a factor of two. 
Further iterations allows for further decompression of the 
data. Accordingly, a determination (26) is made whether 
decompression is complete. If the decompression is in 
incomplete, f unher look-ups are performed, elsethe pro- 
cedure is ended. 
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Description 



This invention relates to e method and apparatus lor compressing e video image for transmission to a receiver 
and/or decompressing the image ai the receiver. More particularly, the invention is directed to an apparatus and method 
lof pertormtnc data compression on a video image using weighted wavelet hierarchical vector quaniizaiion (WWHVO) 
WWHVQ advaniageously utilizes certain aspects of hierarchical veaor quantization (HVO) and discrete wavelet irans- 
lorm (DWT), a subband transtorm. 

A vector quantizer (VO) is a quantizer that maps k-dimensionsi input vectors into one ot a finile set ol k-dimensional 
reproductkxi vectors, of codewords. An analog-to-digital converter, or scalar quantizer, is a special case in which the 
quantizer maps each real number to one of a finite set of output levels. Since the logarithm (base 2) of the number of 
codewords is the number of bits neeaed to specify the codeword, the logarithm ot the number of codewords, divided by 
the vector dimension, is the rate ol the quantizer in bits per symbol. 

A VO can be divided into two pans: an encoder and a decoder. The encoder maps the input vector into a binary 
code representing the index ol the selected reproduction vector, and the decoder maps the binary code into the selected 
reproduction vector. 

A major advantage of ordinary VO over other types of quantizers (e.g.. transform coders) is that the decoding can 
be done by a simple table lookup. A major disadvantage of ordinary VO with respect to other types of quantizers is that 
the encoding is computationally very complex. An optimal encoder performs a full search through the entire set of re- 
production vectors looking for the reproduction vector thai is closest (with respect to a given distortion measure) lo each 
input vector. 

For example, if the distortion measure is squared error, then the encoder computes the quantity ||x - y||2 la each 
input vector X and reproduction vector y. This results in essentially M multiply/add operations per input symbol where 
hA IS the number of codewords. A number of subopllmal. but computationally simpler, vector quantizer encoders have 
been studied in the literature. For a sun/ey. see the book by Gersho and Gray. Vector Quantization and Signal Com- 
pression, Kluwer, 1 992. 

Hierarchical vector quantization (HVO) is VO that can encode using essentially one table lookup per input symbol 
(Decoding is also done by table kXDkup). To the knowledge of the inventors. HVO has heretolore not appeared in the 
literature outside ot Chapter 3 of the Ph.D. thesis of P. Chang. Predictive. Hierarchical, and Transform Vector Ouan- 
lization tor Speech Coding. Stanford University. May 19B6, where it was used tor speech, aher methods named 'hi- 
erarchical vector quantization' have appeared in the literature, but they are unrelated to the HVO that is considered 
respecting the present invention 

The basic idea behind HVO is the lollowing. The input symbols are linely quantized lo p bits of precision. For image 
data, p = 8 is typical. In principle it is possible lo encode a k-dimensional vector using a single kxjkup into a table with 
B kp-bit address, but such a table would have entries, which is clearly inleasible if kand p are even moderatety large. 
HVO penornns the table lookups hierarchically. For example, to encode a k = 6 dimensional vector (whose components 
are each finely quantized to p = 8 bits of precision) to 6 bits representing one of H/l = 256 possible reproductions, the 
hierarchical structure shown in FIGURE 1 a can be used, in which Tables 1 . 2. and 3 each have 16-bH inputs and 6-bit 
outputs (i.e.. they are each 64 KByte tables) 

A signal flow diagram for such an encoder is shown in FIGURE lb. In the HVO of FIGURE lb. the tables T at each 
stage of the encoder along with the delays Z are illustrated. Each level in the hierarchy doubles the vector dimension 
of the quantizer, and iherelore reduces the bit rate by s factor of 2. By similar reasoning, the ilh level in the hierarchy 
performs one lookup per 2' samples, and Iherelore the total number of lookups per sample is at most 1/2 + 1/4 + 1/B 
+ ... = 1 , regardless of the number ol levels. Of course, it is possible to vary these calculations by adjusting the dimensions 
ot the various tables. 

The contents ol the HVO tables can be determined in a variety ot ways. A straightforward way is the lollowing. With 
reference to FIGURE ia, Table l is simply a table-kx>kup version of an optimal 2-dimensional VO That is. an optimal 
2-dimensional full search VO with M = 256 codewords is designed by standard means (e.g.. the generalized Lloyd 
algorithm discussed by Gersho and Gray), and Table 1 is filled so that it assigns to each of its 2i« possible 2-dirT»ensional 
input vectors the 8-bit index of the nearest codeword. 

Table 2 is just slightly more complicated. First, an optimal 4-dimensional full search VO with M = 256 codewords is 
designed by standard means. Then Table 2 is filled so that it assigns to each of its 2^^ possible 4-dimensional input 
vectors (i.e.. the cross product of all possible 2-dimensional output vectors from the first stage) the B-bit index of its 
nearesi codeword. The tables lor stages 3 and up are designed similarly. Note that the disiortion measure is completely 
arbitrary. ' 

A discrete wavelet iranslormalion (DWT). or more generally, a tree-slruclured subband decomposition, is a method 
lor hierarchical signal Uanslornrwliai. Linie or no information is tost in such a Iranslormalion. Each stage ol a DWT 
involves filtering a signal into a low-pass component and a high-pass component, each of which is critically sampled 
(i.e.. down sampled by a faaor of two). A more general tree-structured subband decomposition may filler a signal into 
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more than two bands per stage, and may or may not be critically sampled. Here we consider only ihe DWT. but those 
skilled in the art can easily extend the relevant notions to the more general case. 

With reference to FIGURE 2a. let Xr (x(0). x(1).. ..x(N - i)) be a l -dimensional input signal with finite length N. As 
shown by the tree structure A, the first stage of a DWT decomposes the input signal X^o = X into the low-pass and 
high.pass signals X^i^ {\^iO). Xlt(1)....Xl,(N/2-1)) and Xhi = (Xh,(0). X„T(l).....x„i(Ny2.l)). each of length N/2. The 
second stage decomposes only the low-pass signal X^, from the first stage into the low-pass and high-pass signals Xlj 

= <Xl2(0). Xl5(1) Xj^(N/4.1)) and Xh^ = (XHjtOi. Xh2(1) Xh2(N/4.1)), each of length N/4. Similarly, the third stage 

decomposes only the low-pass signal Xl^ from the second stage into low-pass and high-pass signals X^a and Xh-. of 
lenglhs N/8, and so on. il is also possible lor successive stages to decompose some of the high-pass signals in addition 
to the low-pass signals. The set of signals at the leaves of the resulting complete or partial tree is precisely the transform 
of the input signal at the root. Thus a DWT can be regarded as a hierarchically nested set of transforms. 

To specify Ihe transform precisely, it is necessary to specify the filters used at each stage. We consider only finite 
impulse response (FIR) filters, i.e., wavelets with finhe suppon. L is the length of the filters (i.e.. number of taps), and 
the low-pass filter (the scaling function) and the high-pass fitter (the difference function, or wavelet) are designated by 
their impulse responses. 1(0). 1(1) I(L - 1). and h(0). h(1).. .h(L-1). respectively. Then at the output of the mth stage, 

= 'fo)x^ ^.,(2r) 4 l(l)Xj^ (2i 4 1) 4 ... 4 I(L-1)Xl^, (2i 4 L-1) 

^H.m(0 = h(0)XH.n,.i(2i) * h(1)x„.„.i (2i + 1) 4 ... + h(L-1)x„ (2i 4 L-1) 

^o*" i = 0. 1 N/S"™. Boundary effects are handled in some expedient way. such as setting signals to zero outside their 

windows of definition. The filters may be the same from node to node, or they may be different. 

The inverse transform is perlormed by different iowpass and high-pass filters, called reconstruction filters, applied 
in reverse order. Let r(0). f(i),. ...I'd-i) and h'(0). h"(i)....h'(L-l) be the impulse responses of the inverse fillers. Then 
^® reconstructed from \^ and X„ „ as: 

\.m-1 (20= »'(0)x, „(i) 4 r{2)x^ J\ 4 1)4 h*(0)x„^(i) 4 h'(2)x„^(i 4 1) 

\.m'^ (2i + 1) = l*(1)x^ „,(i 4 1)4 r(3)XL^„(i 4 2) 4 h'(1)x„^(i 4 1) 4 h'(3)x„ „(i 4 2) 

^OT i-0.1 N/2'". That is. Ihe low-pass and high-pass bands are up sampled (interpolated) by a lactor of two, tillered by 

their respective reconstruction fillers, and added. 

Two-dimensional signals are handled similarly, but with two-dimensional filters. Indeed, if the fillers are separable, 
then the filtering can be accomplished by first filtering in one dimension (say horizontally along rows), then filtering in 
the other dimension (vertically along columns). This results in the hierarchical decompositions illustrated in FIGURES 
2B. Showing tree structure 8, and 2C. in which Ihe odd stages operate on rows, while the even stages operate on 
columns. If the input signal Xj^o is an N x N image, then X^, and Xhi are N x (N/2) images, X,^, X,^, Xhl2. and ^hh2 
are {N/2) x (N/2) images, and so forth. 

Moreover, notwithstanding that which is known about HVQ and DWT, a wide variety of video image compression 
methods and apparatuses have been implemented. One existing method that addresses transcoding problems is the 
algorithm of J. Shapiro, 'Embedded Image Coding using Zerotrees of Wavelet Coefficients.* IEEE. Transactions on 
Signal Processing. December 1993, in which Iranscoding can be done simply by stripping off prefixes ol codes in the 
bit stream. However, this algorithm trades simple Iranscoding lor computationally complex encoding and decoding. 

Other l^nown methods lack certain practical and convenient features. For example, these other Known video com- 
pression methods do not allow a user to access the transmitted image at different quality levels or resolutions during 
an interactive multicast over multiple rate channels in a simplified system wherein encoding and decoding are accom- 
plished solely by the performance of table lookups. 

More panicularty. using these other non^mbedded encoding video compression algorithms, when a multicast (or 
simulcast, as applied in the television indusiry) of a video stream is accomplished over a network, either every receiver 
of the video stream is restricted to a cenain quality (and hence bandwidth) level at the sender or bandwidth (and CPU 
cycles or compression hardware)is unnecessarily used by multicasting a number of streams at different bit rates. 

In video conferericing (multicast) over a heterogeneous network comprising, tor example. ATM, the Internet. ISDN 
and wireless, some form of iranscoding is typically accomplished at Ihe 'gateway' between sender and receiver when 
a basic rale mismatch exists between them. One solullon to the problem is for the 'gatewayVreceiver lo decompress 
the video slream and recompress and scale it according lo internal capabilities. This solution, however, is not only 
expensive but also increases lalency by a considerable amount. The transcoding is preferably done in an online lashion 
(with minimal latency/buffering) due to the interactive nature of the application and to reduce hardware/software costs. 

From a user's perspective, the problem is as follows: (a) Sender(i) wants to send a video stream at K bits/sec to M 
receivers: (b) Receiver(j) wants to receive Sender(i)'s video stream at L bits/sec (L<K); but (c) the image dimensions 
that Receiver(j) desires or is capable of processing are smaller than the default dimensions that Sender(i) encoded. 
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It is desirable that any system and/or method to address these problems In interactive video advantageously incor- 
porate (1) inexpensive transcoding trom £ higher to a lower bit rate, preferably by only operatino on a compressed 
stream, (2) simple bit rate control, (3) simple scalability of dimension at the destination, (4) symmetry (resulting in very 
inexpensive decode and encode), and f5) a prioritized compressed stream in addition to acceptable rate-distortion per- 
tomTance. None ol the current standards (Motion JPEG. MPEG and H.261) possess all ol these characteristics. In 
particular, no current standard has a laciliiy to iranscode from e higher lo a lower bit rate efficiently. In addition, all are 
computationally expensive. 

The present invention seeks lo overcome the aforenoied and other problems and to incorporate the desired char- 
acteristics noted above. It js particularly directed lo the art ol video dala compression, and will thus be described with 
specific reference thereto. It is appreciated, however, that the invention will have utility in other fields and applications. 

The present invention provides a method for compressing and Iransmining data, the method comprising steps of: 
receiving the data; successively performing multiple stages of first lookup operatbns to obtain compressed data at each 
stage representing vector quantized discrete subband coefficients: and. transmitting the compressed data to a receiver. 

The method may comprise successively pertorming i levels of a first table lookup operation to obtain, for example. 

compressed data representing a subband transform, e.g., DWT. of the input dala followed by vector quaniizatk>n 
thereof. 

In accordance with another aspect of the present inventkxi. the compressed data (which may also be iranscoded 
at a gateway) Is transmitled to a receiver. 

In accordance with another aspect of Ihe present invention, the compressed data is received at a receiver and 
multiple stages of a second lable lookup operation are performed lo selectively obtain decompressed data representing 
at least a partial inverse subband transform of the compressed dala. 

The invention further provides an apparatus for carrying out the methods as set forthe above or in accordance with 
any of the embodiments described herein 

One advantage of the present invention is that encoding and decoding are accomplished solely by table lookups. 
This results in very efficient implementations. For example, this algorithm enables 30 Irames/sec encodir^g {or decoding) 
of CIF (320x240) resolution video on Sparc 2 class machines with just 50% CPU kMding. 

Another advantage of the present invention is that, since only table lookups are utilized, the hardware implemented 
to perform the method is relatively sinnple. An address generator and a limited number of memory chips accomplish the 
method. The address generator could be a microsequencer. a gate array, an FPGA or a simple ASIC. 

The present invention exists in the conslruclion, arrangement, and combination, of the various parts of the device, 
whereby the objects contemplated are attained as hereinafler more fully set lorth, specifically pointed out in the claims, 
end illustrated by way ol exemplary embodiments in the accompanying drawings in which: 

FIGURE la illustrates a table structure of prior an HVQ: 

FIGURE lb is a signal flow diagram illusi rating prior art HVO for speech coding; 

FIGURES 2a-c are a graphical representation of a prior art DWT; 

FIGURE 3 is 8 flowchart representing the preferred method of the present invention; 

FIGURES 4a-b are graphical representations ol a single stage of an encoder performing a WWHVQ in the method 
ol FIGURE 3; 

FIGURE 4c is a signal flow diagram illustrating an encoder pertorming a WWHVQ in the method ol FIGURE 3; 
FIGURE 5 is 5 signal flow diagram of a decoder used in the method ol FIGURE 3; 

FIGURE 6 is a block diagram of a system using 3-D subband coding in connection whh the method of FIGURE 3; 
FIGURE 7 is a block diagram of a system using frame differencing In connection with the method of FIGURE 3; 
FIGURE 6 is a schematic representation of the hardware implementatkDn of the encoder of the nrvethod of FIGURE 3; 

FIGURE 9 is a schematic representation ol another hardware implemenialion of Ihe encoder of the method FIGURE 

3; 

FIGURE 10 is a schematic rcpicsontaiiofi of the hardware implementation of the decoder of the method of FIGURE 
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3: and 

FIGURE n is e schematic represer.taiion of another hardware implemeniaiion of the decoder of the method ot 
FIGURE 3. 

Referring now to the drewlncs wherein the showings are lor the purposes of illustrating the preferred embodin>ents 
of the invention only and not for purposes of limhing same, FIGURE 3 provides a flowchan ot the overall preferred 
embodiment. II is recognized that Ihe method is suitably implemented in ine structure disclosed in the lollowing preferred 
embodiment and operales in conjunction with sotiware based control procedures. However, it is contemplated that the 

JO control procedure be embodied in other suitable mediums. 

As shown, the WWHVO procedure is initiated by obtaining, or receiving in the system implementing the method, 
input data representing an NxN pixel image where 6 bits represent a pixel (steps 10 and 12). A look-up operation is 
performed to obtain data representing a subband transform toliowed by a vector quantization of the data (step 14). In 
the preferred embodiment, a discrete wavelet transform comprises the subband transform. However, it is recognized 

'5 ihat other subband transforms will suffice. Upon completion of the look-up. a data compression has been performed. 
Preferably, such compression is 2:1. Further stages will result in lunher compression of the data, e.g., 4:1. 6:1. 16:1. 
32:1 . 64:1 .... etc. It is appreciated that other compression ratios are possible and may be desired in certain applications. 
Successive compression stages, or iteralions of step 14, compose the hierarchy d the WWHVQ. Accordingly, a deter- 
mination is made whether Ihe compression is complete (step 16). If 4he compression is incomplete, further look-up is 

20 performed. II the desired compression is achieved^ however. Ihe compressed data is transmitted using any known trans- 
mitter (step 18). It is then determined at, lor example, a network gateway, whether luriher compression is required (step 
1 9). If so, transcoding is performed in an identical manner as the encoding (step 20). In any event. Ihe receiver eventually 
receives the compressed data using known receiving techniques and hardware (step 22). Subsequently, a second 
look-up operation is perlormed to obtain data representing an inverse subband transform, preferably an inverse DV/T. 

^5 of decompressed data (step 24). After one complete stage, the data is decompressed. Further stages allovtf for further 
decompression of the data to a desired level. A determination is then made whether decompression is complete (step 
26). If the decompression is incomplete, further look-ups are performed. If, however, the decompression is complete, 
the WWHVO procedure is ended (step 28). 

The embodiment of FIGURE 3 uses a hierarchical coding approach, advantageously incorporating particular lea- 

30 lures ot hierarchical vector quantization (HVO) and Ihe discrete wavelet transform (DWT). HVQ is extremely last, though 
its perlormance directly on images is mediocre. On ine other hand, the DWT is computatkjnalty demanding, though it 
is known to greatly reduce blocking artilacts in coded images. The DWT coetlicients can also be weighted to match the 
human visual sensitivity indiiterent frequency bands. This results in even belter perlornnarKe. since giving higher weights 
to the rTKjre visually imponant bands ensures that they will be quantized to a higher precisbn. The present invention 

3S combines HVO and the DWT in a novel manner, to obtain the best qualities of each in a single system, Weighted Wavelet 
HVO (WWHVO). 

The basic idea behind WWHVO is to perlorm the DWT filtering using table lookups. Assume Ihe input symbols have 
already been finely quantized to p bits of precision. For nxjnochrome Image data, p r 6 is typical. (For color inr^ge data, 
each color plane can be treated separately, or they can be vector quantized together into p bits). In principle it is possible 

-0 10 filter the data with an L-tap filler with one lookup per output symbol using a table with a Lp bit address space. Indeed., 
in principle it is possible to perform both the kDw-pass liliering and the high-pass lillering simultaneously, by storing both 
lowpass and high-pass results in the table. Oi course, such a table is clearly inleasible if L and p are even moderately 
large. We take an approach similar to that of HVO: lor a filler of length L, log^ L •substages' of table lookup can be used, 
if each table has 2p input bits and p output bits. The p output bils of the final subslage represents a 2-dimensionaf vector, 
one symbol from the low -pass band and a corresponding symbol Iran the high-pass band. Thus, the wavelet coefficients 
output from the table are vector quantized In this sense the DWT is tightly integrated with the HVQ. The wavelet 
coefficients at each stage of the DWT are vector quantized as are the intermediate results (after each substage) in the 
computation of the wavelet coefficients by table lookup. 

FIGURE 4a shows one stage i ol the WWHVO. organized as logjL substages of table lookup. Here, L = 4, so that 

so the number of substages is two. Note that the filters slide over by two inputs to compute each output. This corresponds 
to oownsampling (decimation) by a factor of 2. and hence a reduction in bit rale by factor of 2. The second stage of the 
WWHVO operates on coded outputs from the first stage, again using log^L substages of table tookup (but v^ith different 
fables than in Ihe first stage) and soon for the following stages, ft is recognized that each ot a desired number of stages 
operates in substantially Ihe same way so thai ihe p bus at the output of stage i represent a 2*: 1 compression. The p 

SB bils at the output ol the linal stage can be transmitted directly (or indirectly via a variable- length code). The transmitted 
data can be further compressed, or •iranscodedV loi example, at a gateway between high and low capacity networks, 
simply by further stages of table lookup, each of which reduces the bit rate by a factor of 2. Hence both encoding and 
transcoding are extremely simple. 
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The lables in the fifSl stage (i = 1) may be designed as loHows. with relerence to FIGURE 4a. In this discussion, we 
shall assume that the input signal X = (x(0).x(i).....x(N.i)} is one-dimensional. Those skilled in the art will have little 
difficulty generalizing the discussion to image data. First decompose the input signal X^q = X into low-pass and high-pass 
signals X^, and each of length N/2. This produces a sequence of 2-dimensional vectors 

x(') = KiO), XH,(l).for I = 0. 1. .... N/2-1. 

where 

Xm (i) = l(0)x(2i) + 1(1 )x(2i + 1) + l(2)x(2i + 2) l(3)x(2i + 3) 

and 

x„i= h(0)x(2i) + h(1)x{2i 4 1) ^ h(2)x(2i 4 2) 4 h(3)x(2i + 3) 

Such 2-dimensiona I vectors are used totrain an 6-bli vector quantizer ^ lor Table 1.2. Likewise. 2-dimensional vectors 

|l(0)x(2i} 4 l{1)x{2i + 1). h(0)x(2i) + hO)x(2i + 1)]. 
are used totrain an 6-bit vector quantizer O, ,3 for Table i.la. and 2-dimensional vectors 

(l(2)x(2i 4 2) 4 l(3)x(2i ^ 3). h{2)x(2i + 2) + h(3)x(2i + 3)]. 
are used lo train an S-bit vector quantizer Qi |t lor Table 1.1b. All three quantizers are trained to minimize the expected 
weighted squared error distortion measure d(x.y) r |wl,(Xo • y©}? + [Whi(x, • y^P. where the constants w^, and w„, 
are proportional to the human perceptual sensitivity (i.e., inversely proportbnal to the just noticeable cor^trast) in the 
low-pass and high-pass bands, respectively. Then table Via is filled so that it assigns to each of its 2^* possible 2-di- 
mensional inpui vectors (Xo-X,) the 8-bit index of the codeword [yLi.ia.yMi.ia] o' Qi.ia nearest to [l(0)Xo + l(1)Xi,h(0)x^ 
4 h(1)Xi] in the weighted squared error sense. Table 1 lb is filled so that it assigns to each of its 2^^ possible 2-dimen- 
sional input vectors (Xg.xJ the S-btt index of the codeword [y^^ib. yHi.ibJ Oi.ib nearest to [l(2)X2 4- l(3)X3, h(2)X2 + h 
(3)X3] in the weighted squared error sense. And finally Table 1.2 is filled so that it assigns to each of its 2^6 possible 
4-dimensionaI input vectors (i.e. the cross product of all possible 2-dimensional output vectors from the first stage), for 
example. [yL, y^, .,3' Vli .u- yMi.ibl. 9bit index of the codeword (yL,. yHi] of O,^ nearest to [y^i.,. + y^i yHi.,. 
+ yHLib] weighted squared error sense. 

For a snnall cost In performance, it is possible to design the lables so that Table 1.1a and Table 1 . 1 b are the same, 
for instance, Table 1.1, if i = 1 in FIGURE 4b. In this case, Table 1.1 is simply a table lookup version of a 2-dimensional 
VQ that best represents pairs ol inputs (x^, x,J in the ordinary (unweighted) squared error sense. Then Table 1 .2 is lifted 
so that it assigns to each of its 2^^ possible 4-dimensional input vectors (i.e., the cross product of all possible 2-dimen- 
sional output vectors from the first stage), tor example. \yQ, y^. y^. y^]. the B-bit index of the codeword [y^, . Vhi) o1 2 
nearest toll(0)yo 1(1 )yi * I{2)y2 + I(3)y3.h(0)yo = h(l)y, h(2)y2 + h(3)y3] in the weighted squared error sense, leaking 
Tables 1 .la and 1 lb the same would result in a savings of both table memory and computation, as shown in FIGURE 
4b. The corresponding signal flow diagram is shown in FIGURE 4c. 

Referring again to FIGURE 4a, the lables in the second stage (i =. 2) are just slightly mote complicated. Decompose 
the input signal X^,. of length N/2, into low-pass and high-pass signals X,_2 and X^g. each of length N/4. This produces 
a sequence ot 4-dimensional vectors. x(i) = (XL2(i),XH2(i).XHi(20.Xni(2i + 1)1. lor i = 0,1,....N/4 - 1, where x,^(i) - \iO)x^^^ 
(2i) 4 l(l)XLi(2i + 1) f(2)Xi.,(2i 4 2) -^ U3)XL,(2i 3) and XhjCI) ^ h{0)x^^y{2\) 4 h(l)XLi(2i 1) 4- h(2)XL,(2i -^ 2) -^ h(3)XL, 
(2i 4 3). Such 4<iimensiona I vectors are used lo train an e-bit veclor quantizer 02.3 for Table 2.2. Likewise. 4-dimensional 
vectors ll(0)XL,(2i) + l{i)XL,(2i 4 l),h(0)XL,(2i) 4 h(l)XL,(2i + l),XHi(2i),XHi(2i + 1)]. are used lo train an 8-bil veclor 
quantizer Qj.i,, lor Table 2.ia. and 2-dimensionaf vectors II(2)XL,(2i 4 2) 4 l(3)XL|(2i 4 3), h(2)XLi(2i + 2) + h(3)XL,(2i 4 
3). Xh, (2i 4 2). x„i (2i 4 3)J. are used 10 train an S-bn vector quantizer ^ ^ "^^a^'e 2. lb. All three quantizers are trained 
to minimize the expected weighted squared error distortion measure d(x.y) = (Wi_2(Xo-yo)P l^H2^^^' yi)l^ + \^fi^i^2 ' 
yg))^ ^ |w^^,(X3 - y^)]^: where the consianis Wl^. w^^^ find w^i are proponional to the human perceptual sensitivities in 
their respective bands. Then Table 2. la is filled so that it assigns to each ol its 2''^ possible 4-dimensional input vectors 
lyo= yv y2= yal ^^^^^ codeword of Oj.ie nearest to [l(0)yo 4 f(i)yv^^(0)yo 4 h(l)y,,y2,y3] in the weighted 

squared error sense, and so on for Tables 2.1b and 2.2 in the second stage, and the tables in any succeeding stages. 

As in HVO. in WWHVO the vector dimension doubles with each succeeding stage. The formats of the vectors at 
each stage are shown as outputs of the encoder 30. graphically represented in FIGURE 4c. These formats are for the 
case of the twodimensiorel separable DV\rr shown in FIGURE 2b. 

Referring now lothe case where all lables in a given subsiage are ideniical as in FIGURE 4b. and 4c. if the number 
of inputs to a stage is S bytes, then the number ol outputs is: 

S/2 

and the number of table lookups per output is log L. if the compression ratio is C. then the total number of outputs 
(including the outputs of intermediate stages) is 



N^(C-i)/C 
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lor an image ot size N^. Thus, the tola I number o1 table lookups is: 

(N^C-l)log LyC 

If the amount of storage needed for the HVO encoder is T. then the amount ot storage needed for the WV/H VO encoder 
B is T log L per stage. 

Also shown in FIGURE 4c are the respective delays 2 which success ive ly inc reese wiih each stage. The oval symbol 
inciudinc the i 2 designation indicates that only one of every two outputs is selected, as those skilled in the art will 
appreciate. 

The WWHVO decoder 50 which perlorms steps 22-26 of FIGURE 3 is shown m FIGURE 5. A!) tables of decoder 
10 50 in a given substage are identical, similar ic the encoder of FIGURES 4b and 4c. As those skilled in the art will appre- 
ciate, the oddeven split tables 52. 64 at each stage handle the inierpolaiion by 2 that is pan of the DWT reconstruction. 
If L = 4 and the filter coefficients are h(i) and l(i).i = 0, 1,2,3. then the odd table 52 computes h(1)x(i) 4 h{3)x(i + 1) and 
l(l)x(i) 4 (i 4 1). while the even table 54 computes h(0)x(i) + h(2)x(i + i) and l(0)x(i) 4 l(2)x(i + 1). where x(i)'s are the 
inputs to that stage. If the number of inputs to a stage is S. then the number of outputs Irom that stage is 2S. The total 
ts number o1 table lookups for the stage is: 

S log iU2) 

If the compression ratio is C, then the total number of outputs (including outputs of intermediate stages) is: 

2N^(C-1)/C 

20 tor an image of size N^. Thus, the total number of table lookups is: 

(N^(C-1)/C)k3g(Ly2) 

If the amount ot storage needed for the HVO encoder is T, then the amount of storage needed for the WWHVO decoder 
per stage is: 

« T(log(iy2kl)=Tk)gL 

All the storage requirements presented are for e-biis per pixel input. For color images (YUV, 4:2:2 format) the storage 
requirements double. Similarly the number of table lookups also doubles for color images. 

As shown in FIGURES 6 and 7. there are two options for handling nrxDiion and interframe coding using the present 
method. In a lirsl mode (FIGURE 6). the subband coding is exlenoed and followed by a vector quantization scheme that 
30 allows for intraframe coding performed as described in connection with encoder 30 to inlertrame coding, designated by 
reference numeral 80. This is similar to 3-D subband coding. 

The second way (FIGURE 7) of handling motion is to use a simple frame differencing scheme that operates on the 
compressed data and uses an ordered codebook to decide whether to transmit a cenain address. Specifically, the 
WWHVO encoder 30 shown in FIGURE 4c is used in conjunction with a frame differencer 70. The frame differencer 70 
3S uses a currenl frame encoded by encoder 30 and a previous frame 72 as inputs to obtain a result. 
Some of the features of the WWHVO of the present invention are: 

Transcoding 

^ The sender transmits a video stream compressed at l6:l . The receiver requests the 'gateway' tor a 32:1 stream 

(or the gateway sees that the slower link can only handle 32: 1). All the gateway has to do to achieve this transcoding 
is do a further level of table k»kup using the dala it receives (al 16 1} as input. This is extremely useful, especially when 
a large range of bandwidlhs have to be supported, as. for example, in a heterogenous networked environment. 

Dimension Scaling 

If the video stream is compressed up to J stages, then the receiver has a choice of [J] 4- i image sizes without any 
extra efton. For example, if a 256 x 256 image is compressed lo 6 stages (i.e.. 64:i), then the receiver can reconstruct 
a 32 X 32 or 64 X 64 or 126 x 126 or 256 x 256 image without any overhead. This is done by just using the tow pass 
so bands LL available at the even numbered stages (see FIGURE 3). Also, since the whole method is based upon table 
look-ups, it is very easy to scale the image up. i.e.. the interpolation and lowpass filtering can be done ahead of time. 
Note also thai all this is done on ihe compressed bit-stream itself. In standard video compression schemes both dov^ 
anc up scaling is achieved by explicitly filtering and decimating or interpolating the input or output image. 

55 Motion 

There arc two simple options for handling nrNMion Simple frame diflcrencing can be applied to the compressed data 
itscff. Another option is to use a 3-D subband coding scheme in wnich the temporal dimension is handled using another 
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WWHVO in conjunaion with ihe spaiial WWHVO. The wavelet used here can be ditiereni (ii is ditterent in practice). In 
c preferred implementation, motion detection and thresholding are sccomplished. This is done on ihe compressed date 
stream. The current compressed trame and the previous compressed trsme are compared using a table lookup. This 
generates a binary map which indicates places where motion has occurred. This map is used to decide which blocks 
(codewords) to send and a zero is inserted in place o1 stationery blocks. This whole process is done in an online manner 
with a run-length encoder. The table that is used to create the binary map is constructed ott-line and the motion threshold 
is set at that time. 

Dilherino 

Typically, the decoder has to do color space conversion and possibly dithering (for< 24.bil displays) on the decoded 
stream before it can display it. This is quite an expensive task compared to the complexity of the WWHVQ decoder. But 
since the WWHVO decoder is also based upon table lookups, these steps are incorporaied into the decoder output 
tables. Other than speeding up the decoder, the other major advantage of this technique is that it makes the decoder 
completely independent of the display {resolution, depth etc.). Thus, Ihe same compressed stream can be decoded by 
the same decoder on two different displays by just using different output tables. 

The WWHVO method is very simple and inexpensive to implement in hardware. Basically, since Ihe method pre- 
dominanlly uses only table lookups, only memory and address generation logic is required. The architectures described 
below differ only in the amount of memory they use versus the amount of logic. Since alternate stages operate on row 
and column data, there is a need for some amount of buffering between stages. This is handled explicitly in the archi- 
tectures described bek)w. All the storage requirements are given lor e-bils per pixel input. For color images (YUV. 4:2:2 
tormat) the storage requirements double. But the timing information remains the same, since the luminance and chromi- 
nance paths can be handled in parallel in hardware. Also, as noted above, two simple options are available for interlrame 
coding in WWHVO. The 3-D subband coding option is implemented as an additional WWHVO module, while the frame 
differencing optkxi is implemented as a simple comparator. 

Referring now to FIGURE B, each one of the tables i. 1 (or i. 1 a and i. 1 b) and i.2 of the present invention are mapped 
onto a memory chip (64KB in this case). The row-column aliernaiion between stages is handled by using a buffer 60 of 
NL bytes between stages, where N is the row dimension of Ihe image and L is the length of the wavelet filter. For example, 
between stages 1 and 2 this buffer is wrinen in row format (by stage 1) and read in column format by stage 2. Some 
simple address generation logic is required. Accordingly, address generator 90 is provided. The address generator 90 
is any suitable device such as an incremenier. an accumulator, or an adder plus some combinational glue logic. However, 
this architecture is almost purely memory. The input image is led as the address to the lirsl memory chip whose output 
is ted as ihe address lo the next memory chip and so on. 

The total memory requirements tor the encoder 30 and the decoder 50 are T log L NL{M - i ) bytes, where M is 
the number of stages. For example, the WWHVO encoder and decoder shown in FiGUREs 3 and 5 need 64KB lor each 
table i.l. i.2 plus the buffer memory 60. If the number of stages, M. is 6, the wavelet filter size, L. is A and the image row 
dimension. is 256. then the amount of memory needed is 768KB 5 KB = 773 KB. The number of 64KB chips needed 
is 1 2 and the number of 1 KB chips needed is 5. The throughput is obviously maximal, i.e.. one output every block cycle. 
The latency per stage is NL ckxks, except for the first stage which has a latency of i clock cycle. Thus the latency after 
the m stages is (m • l)NL + l clocks. 

The main advantage of this architecture is thai it requires almost no glue logic and rs a simple flow through archi- 
tecture. The disadvantages are that the number of chips required scales with the number of stages and board area 
required becomes quite large. The capacity ol each chip in this archileclure is quite small and one can easily buy a 
cheap memory chip which has the capacity ol all these chips combined. This is considered in the architecture of FIGURE 
9 

Referring now to FIGURE 9. if all the tables are loaded onto one memory chip lOO. then an address generator 102 
and a frame buffer 104 are used. In this architecture one stage of the WWHVO is computed at a time. In fact, each 
sub-level of a stage is computed one at a time So in the encoder 30 the table lookups lor table i.l are done first, the 

N^/2 

values that result are stored in the (half) frame buffer 104. These values are then used as input for computing the table 
lookups of table i.2 and so on. Clearly, the most frame storage needed is: 

N^/2 

byles. The Irame buffer 104 also permits simple row-column alternation between stages. 

The address generator 102 has to generate two addresses, one for the table mcmoiy chip 106 and another lor the 
frame memory chip 104. The address for Ihe Irame memo^ chip 104 is generated by a simple incrementer, since the 
access IS uniform. If the number of taps in the wavelet filler is L. then ihe number of levels per stage (i.e., wavelet tables) 
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lor the encoder 30 is log L (ii is tog U2 tor the decoder). Each ot the tables i,i and i.2 are of size isize. The output of 
level q of stage m is computed. II the ouipul ol the frame rr^emory ^ 04 is y(i), then the address that the address generetor 
102 computes for the table merrwry 106 is y(i) + offset, where offset = |(m - i)log L + (q-1)]'lsi2e. This is assuming the 
tables i.1, 1.2 are stored in an ordered manner. The muiiiplicaiions involved in the ccmpulaiion of offset need not be 
done since offset can be maintained as a running total. Each time one level ot computation is completed offset = offset 
4 isize. Thus, the address generator 102 is any suitable device such as an incremenier, an accumulator, or en adder 
plus some combinational glue logic. 

The total memory requirements lor this Efchliecture is: 

T log L 4 

bytes spread out over two chips, the frame memory 104 and the table memory 106. For the example considered in the 
previous section this translates to 800KB of memory (76eKB 4 32KB). The throughput is one output every clock cycle. 
The latency per stage is: 

(N^/2""'')logL. 

where m is the stage number. The first stage has a latency of just log L. Thus, the latency after m stage is: 

N^i-(i/2'"''))logL4l 

The advantages of this architecture are its scalability, simplicity and low chip count. By using a large memory chip 
100 (approximately 2MB or more), various configurations ol the encoder and the decoder, i.e., various table sizes and 
precision, may be considered and. the number of stages can be scaled up or down without having to change anything 
on the board. The only disadvantage is latency, but in practice the latency is well below 50 milliseconds, v^ich is the 
threshold above which humans start noticing delays. 

It is important to note the connection between the requirement of a half frame menx)ry 104 and the latency. The 
latency is there prinnarily due to the fact that all the outputs of a stage are computed before the next stage computation 
begins. These intermediate outputs are stored in the frame memory 104. The reason the latency was low in the previous 
architecture was that the computation proceeded In a flow through manner, i.e., begin computing the outputs of stage 
m before all the outputs of stage m - 1 were computed. 

FIGURE 10 illustrates the architecture lor the decoder similar to the encoder of FIGURE 8. As shown, an address 
generator 90' is connected to a buffer 60* having inputs ol Ihe odd and even tables 52 and 54. The outpul ol the buffer 
connects to the next stage and the address generator also connects to the next butfer. 

FIGURE 1 1 shows the architecture for the decoder similar to the encoder of FIGURE 9. As illustrated, the interlrame 
coder 80' is simply placed at the input to the chip 100'. as opposed to the output. 

The memory chips utilized to facilitate the look-up tables of the present invention are preferably read only memories 
(ROMs) However, it is recognized that other suitable memory means such as PROM, EPROM, EEPROM. RAM, etc. 
may be utilized. 



Claims 

1 . A method lor compressing and transmitting data, Ihe method comprising steps ol. 

receiving the data; 

successively performing multiple stages ol first lookup operanons to obtain compressed data at each stage 
representing vector quantized discrete subband coefficients; and, 
transmitting the compressed data to a receiver. 

2. The method according to claim l funher comprising: 

receiving the compressed data at the receiver; 

successively perlorming multiple stages of second lookup operations to selectively obtain at each stage before 
a last stage decompressed data representing a partial inverse subband transform of the compressed data. 

3. The method of claim 1 or 2, wherein I lookup stages are perlormec and the compressed data is 2*:! compressed 
data, where i is an integer. 

4. The method according to claim 1. 2 or 3, wherein the subband transiorm coefficients comprise discrete wavelet 
transform coefficients. 

5. The method of any ol claims 1 to 4, further comprising transcoding ;hc compressed data after the transmitting at a 
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csteway tc the receiver to obtain further compressed data. 

6. A method adaptable tor use on signal data compressed by perlormlno a subband transform followed by a vector 
cuantizaticn of the signal data, comprising steps of: 

receiving the compressed signal data; 

successively performing multiple stages of lookup operations to selectively obtain at each stage before a last 
stage panially decompressed data representing a partial subband transform ot the compressed signal data. 

7. The apparatus lor compressing and transmitting data, comprising: 

means for receiving the data; 

means for successively performing multiple stages o1 first lookup operations to obtain compressed data at 
each stage representing vector quantized discrete subband ooefficients: and. means transmitting the compressed 

cats to a receiver. 

6. The apparatus according to claim 7, further comprising: 

means for receiving the compressed data at the receiver: and 

means for successively performing multiple stages of second lookup operatksns to selectively obtain at each 
stage belore a last stage decompressed data representing a panial inverse subband transform of the compressed 
data. 

9. An apparatus adaptable for use on signal data compressed by performing a subband Iranslorm followed by a vector 
quantization of the signal data, comprising: 

means tor receiving the compressed signal data; 

means for successively performing muhipie stages of lookup operations to selectively obtain at each stage 
before a last stage panially decompressed data representing a partial subband transform of the compressed signal 
data. 

10. A programmable apparatus for compressing and receiving data, when suitably pprogrammed for carrying out the 
method ol any of claims 1 to 6. 
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